The 1000 Genomes Project SNPs and short indels were all submitted to dbSNP and longer structural variants to the DGVa.
Where possible, release VCF files contain the appropriate IDs in the ID column, such as dbSNP rs IDs.
The archives contain variants discovered by the final phase of the 1000 Genomes Project (phase 3) and also by the preliminary pilot and phase 1 stages of the project. As methods were developed during the project, phase 3 represents the final data set.
No, not all the variants in the browsers produced by the 1000 Genomes Project were discovered by the 1000 Genomes Project.
The data from the 1000 Genomes Project is available in a number of browsers, including browsers produced by the 1000 Genomes Project, which reflect the major data releases associated with the pilot, phase 1 and phase 3 publications from the 1000 Genomes Project. More information on this is available on the browsers page.
The content of the 1000 Genomes Project Browsers, maintained during the 1000 Genomes Project, are based on custom versions of the Ensembl browser. These databases contain the Ensembl core features (genes and transcripts), regulatory elements from the Ensembl Regulatory Build and variation data from the Ensembl Variation database.
As well as 1000 Genomes Project variation data, Ensembl variation contains data from dbSNP, ClinVar, COSMIC, dbGaP, dbVAR, EGA and many other sources.
This can be done using Ensembl’s Biomart.
This YouTube video gives a tutorial on how to do it.
The basic steps are:
If you would like the coordinates on GRCh38, you should use the main Ensembl site, however if you would like the coordinates on GRCh37, you should use the dedicated GRCh37 site.
Ensembl and UCSC Genome Browser both import their variant data from dbSNP. When new 1000 Genomes variants have been released it can take some time for them to be accessioned by dbSNP and make their way to the browsers.
When this happens we try to ensure there is a version of our own browser which displays the data in the meantime. Both Ensembl and UCSC support attaching VCF files to them for visualisation
The 1000 Genomes Project submits all its variants to archives like dbSNP or the DGVa. If it hasn’t yet made it to dbSNP this means it is likely to be a new site which we haven’t yet submitted. There may also be some old sites which we subsequently discover to be false discoveries which we then suppress.
As far as our overlap with the HapMap site list goes, The majority of HapMap SNPs are found in the 1000 Genomes Project, there will be a small number of sites we fail to find using next generation sequencing but most sites from HapMap which aren’t found by the 1000 Genomes Project will be false discoveries by HapMap. There are a lot of SNPs from the 1000 Genomes Project and other next generation sequencing projects which won’t be part of HapMap as HapMap is based on an older genotyping technology when such rapid variant discovery using sequencing was not possible.